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ABSTRACT:   AN  EVALUATION  OF  FITNESS  REPORTS  SCALES 


A  sample  of  convenience  was  obtained  in  which  15  officers  completed 
(anonymously)  fitness  reports  on  each  other.   Fitness  report  scales 
were  examined  to  determine  their  quality  based  on  the  statistical 
considerations  of  "discrimination"  and  "disagreement"  index.   It  was 
found  that  there  was  a  greater  spread  in  scores  (fitness  marks)  when 
a  number  of  judges  rate  one  individual  than  when  an  average  judge 
rates  a  number  of  individuals.   Generalizing  from  the  study  is 
prohibited  by  the  size  and  nature  of  the  sample.   The  study  demon- 
strates a  type  of  analysis  that  can  be  performed  and  the  type  of 
information  that  can  be  obtained  by  studies  of  this  type.   Replica- 
tions of  the  study  are  recommended. 
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Problem  and  Background 

How  can  one  evaluate  fitness  report  scales?   Ideally,  in  measuring 
the  effectiveness  of  a  scale,  one  would  have  some  objective  measure- 
ment of  an  officer's  work  performance  to  which  fitness  report  scale 
marks  would  be  related.   The  fitness  report  scales  on  which  the  marks 
more  closely  reflected  the  objective  measurement  of  work  performance 
would,  of  course,  be  the  more  desirable  scales.   But  such  an  objective 
measurement  of  job  performance  does  not  exist.   If  it  did,  there  would 
be  no  need  for  the  fitness  report  scales  which  involve  human  judgment 
and  all  its  human  errors,  since  the  objective  measurement  of  job 
performance  itself  would  serve  any  purpose  for  which  the  scales  are 
used. 

Since  the  objective  measurement  of  work  performance  does  not  exist, 
other  bases  for  evaluating  the  scales  must  be  used.   One  basis  used  to 
evaluate  scales  is  to  make  a  judgment  of  their  relevancy.   This  is 
perhaps  the  most  important  aspect  of  any  scale.   Someone  has  evidently 
judged  that  the  scales  on  the  fitness  report  are  relevant  to  the 
performance  of  officers.   Otherwise  they  would  not  have  been  included 
on  the  form. 

Another  basis  for  evaluating  scales  is  "statistical."  This  report 
is  concerned  with  the  "statistical"  evaluation  of  scales.   Assuming 
qualitative  differences  between  ratees  actually  exist,  statistically 
good  scales  have  the  following  two  characteristics: 

a.  "Discrimination"  between  individuals,  so  that  individuals 
are  not  all  rated  at  the  same  level. 

b.  Inter-rater  reliability,  or  very  little  "disagreement"  among 
raters  when  they  are  judging  the  same  behavior. 


These  two  statistical  characteristics  can  then  be  used  to  evaluate 
fitness  report  scales.   The  information  needed  to  evaluate  the 
"disagreement"  characteristic  is  not  usually  available. 

A  retired  Commanding  Officer  of  a  destroyer  made  available  the 
information  making  evaluation  on  both  characteristics  possible.   Aboard 
his  destroyer  he  became  intensely  interested  in  the  judgmental  evalua- 
tion of  his  officers.   He  had  each  of  his  fifteen  officers  complete 
the  then  operational  fitness  report  [NavPers  310  (Rev.  4-62)  presented 
as  the  Appendix  to  this  report]  on  each  of  the  other  officers.   The 
reports  were  completed  as  usual  except  that  the  raters  remained 
anonymous.   A  summary  of  the  sample's  characteristics  is  presented 
in  Table  1. 

TABLE  1 
Population  Characteristics 


Rank 

Fr 

equency 

USN(R) 

Frequency 

Designator 

Frequency 

ENS 

4 

USN 

10 

1100 

9 

LTJG 

8 

USN(R) 

5 

1105 

5 

LT 

0 

3100 

1 

LCDR 

1 

CDR 

2 

CAPT 

0 

Total 

15 

15 

15 

Procedure 

Points  were  assigned  to  each  of  the  rating  scales  of  items  14 
through  20  as  indicated  on  the  report  form.   Item  14  consists  of 
9-point  scales,  items  15  and  16  consist  of  5-point  scales,  and  item  20 


consists  of  7-point  scales.   Comparisons  among  scales  were  restricted 
to  comparisons  within  these  three  sets  of  scales  C9,  5,  and  7-point 
lengths) . 

Index  numbers  were  generated  to  reflect  the  two  statistical 
characteristics  of  discrimination  and  disagreement.   Table  2  shows  the 
computation  procedure  for  obtaining  the  "discrimination  index"  and 
"disagreement"  index. 

TABLE  2 
Computation  of  Discrimination  Index  and  Disagreement  Index 

Each  scale  was  analyzed  as  follows: 
RATEES 


1. 

2. 

3. 

A. 

5. 

cr  of 

Each  Row 

R 

A 

X 

X 

X 

X 

X 

aA 

A 

B 

X 

X 

X 

X 

X 

°B 

T 

C 

X 

X 

X 

X 

X 

°C 

h 

R 

D 

X 

X 

X 

X 

X 

aD 

S 

E 

X 

X 

X 

X 

X 

aE 

x    =  Discrimination 
°A-E      Index 


a  of  each   a 
column   V  1 


x    =  Disagreement  Index 

1-5 


Each  scale  was  examined  individually.  "Discrimination"  was  first 
determined  for  each  rater.  Even  though  the  raters  were  not  identified 
on  the  reports,  it  was  possible  to  group  reports  of  the  same  rater  by 


matching  certain  miscellaneous  characteristics  of  the  reports.   The 
standard  deviation  of  the  marks  the  rater  assigned  for  each  scale 
across  ratees  (in  Table  2  —  the  standard  deviation  of  rows)  was 
computed,  a  numerically  high  standard  deviation  indicating  good 
discrimination.   To  obtain  a  discrimination  index  for  each  scale 
across  raters,  a  simple  average  (mean)  of  these  standard  deviations 
was  computed  (in  Table  2  —  the  average  of  the  row  standard  deviations). 
An  index  of  "disagreement"  was  also  generated  for  each  scale.   The 
standard  deviation  of  the  marks  the  raters  assigned  to  each  ratee 
was  computed  (in  Table  2  —  the  standard  deviation  of  columns).   To 
obtain  a  "disagreement  index"  across  all  ratees,  a  simple  average  (mean) 
of  these  standard  deviations  was  computed  (in  Table  2  —  the  average 
of  the  column  standard  deviations) .   These  disagreement  indexes  are 
influenced  by  both  the  relative  ratings  assigned  by  raters  (i.e., 
agreement  among  raters  in  their  relative  ordering  of  ratees)  and 
agreement  among  raters  in  the  absolute  level  of  their  ratings  —  the 
aspect  usually  influenced  by  leniency  error. 

Results 

The  results  of  the  analysis  are  shown  in  Table  3.   Most  of  the 
scales  that  were  high  in  "disagreement"  were  low  in  "discrimination" 
and  vice  versa.   A  lack  in  either  low  disagreement  or  high  discrimina- 
tion reduces  the  utility  of  the  scale.   Five  of  the  scales  were 
relatively  favorable  on  both  the  disagreement  and  discrimination 
scales.   They  are: 


10 


14e    Performance  -  As  ( )  Watch  Officer 

14f    Performance  -  Technical  Specialty  ( ) 

20k    Leadership  -  Personal  Behavior 

201    Leadership  -  Military  Behavior 

20m    Leadership   -  Self-expression  (oral) 
They  constitute  the  best  of  the  scales  as  determined  by  this 
statistical  analysis. 

Three  of  the  scales  were  relatively  poor  on  both  disagreement  and 
discrimination.   They  are: 

16c    Foreign  Duty 

20a    Leadership   -  Professional  Knowledge 

20b    Leadership  -  Moral  Courage 

The  most  significant  finding,  however,  is  the  similarity  in  level 
of  "discrimination"  and  "disagreement"  indexes.   Ideally,  judges  would 
rate  an  individual  on  a  scale  with  perfect  agreement;  and,  assuming 
that  individual  differences  exist  on  a  scale,  their  ratings  would 
reflect  the  true  range  of  individual  differences  on  that  scale.   Of 
the  26  scales  on  the  fitness  report,  17  scales  have  disagreement  values 
that  numerically  exceed  their  discrimination  values.   This  finding 
indicates  that  for  these  17  scales,  there  is  a  greater  spread  in 
scores  when  a  number  of  judges  rate  one  individual  than  when  an 
average  judge  rates  a  number  of  individuals.   In  other  words,  in 
this  sample  the  raters  disagree  on  individuals'  ratings  on  a  scale 
to  a  greater  extent  than  average  raters  are  able  to  discriminate 
among  individuals  on  the  scale. 
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TABLE  3 

Disagreement  Index  and  Discrimination  Index 
of  Each  Fitness  Report  Scale 


Item 


Title 


No.  Scale  Disagreement  Discrimination 
Points     Index  1        Index  2 


14 


Performance 


(a 
(b 
(c 
(d 
(e 
(f 

(g 
(h 


Present  Assignment  9 

Shiphandling  and  Seamanship  9 

Airmanship  9 

Collateral  Duties  9 

As  Watch  Officer  9 

Technical  Specialty  ( )  9 

Command  Potential  or  Ability  9 
Administrative  &  Management 

Ability  9 


1.30 
1.14 


21 
19 
03 
36 


15   Overall  Evaluation 


1.37 


.61 


1.16 

1.11 

.98 
1.16 
1.15 
1.33 

1.15 

.72 


16   Desirability 


(a] 

Operational 

5 

.84 

(b] 

Staff  or  Administrative 

5 

.82 

(c: 

20 

Foreign  Duty 
Leadership 

5 

.84 

(a; 

Professional  Knowledge 

7 

.89 

(b; 

Moral  Courage 

7 

.89 

(c: 

Loyalty 

7 

.79 

(d: 

Force 

7 

.91 

(e: 

1   Initiative 

7 

.89 

(f: 

I   Industry 

7 

.87 

(g: 

1   Imagination 

7 

.82 

(h: 

Judgment 

7 

.84 

d: 

1   Reliability 

7 

.85 

cj: 

1   Cooperation 

7 

.90 

(k] 

1   Personal  Behavior 

7 

.79 

a: 

!   Military  Behavior 

7 

.82 

(m: 

1   Self-expression  (oral) 

7 

.84 

(n; 

I   Self-expression  (written) 

7 

.73 

81 
81 
78 


80 
,80 
,76 
,89 
,88 
.87 
,78 
,79 
,80 
,98 
,85 
,94 
,84 
,73 


Notes: 

"Vlean  standard  deviation  of  ratings  on  same  subjects  by 
different  raters. 

2 
Mean  standard  deviation  of  ratings  for  different  subjects 

by  same  raters  . 
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Conclusions  and  Recommendations 

For  this  specific  sample,  the  "statistically  desirable"  qualities 
of  each  scale  were  determined  and  the  five  "statistically  best"  scales 
were  identified.   In  general  it  was  found  that  raters  differed  among 
themselves  on  the  ratings  they  assigned,  to  the  same  degree  that  an 
average  rater  discriminated  among  ratees.   Some  implications  of  this 
general  finding  are  that:   (1)  fitness  marks  should  be  interpreted  as 
being  highly  dependent  on  the  particular  rater  involved,  and  (2)  there 
is  a  need  for  training  of  raters  and/or  better  definition  of  scales 
so  that  inter-rater  agreement  would  be  increased.   It  is  recognized 
that  only  one  of  the  15  raters  in  this  study  (the  Commanding  Officer) 
was  a  practiced  rater.   But  since  specific  training  in  rating  is  not 
normally  provided  for  officers  who  will  be  expected  to  complete  fitness 
reports,  the  lack  of  experience  in  14  of  the  15  raters  of  this  study 
may  not  have  reduced  the  representativeness  of  this  sample. 

It  would  be  expected  that  ratings  by  peers  would  differ  somewhat 
from  ratings  by  superiors  or  ratings  by  subordinates.   This  study 
combined  all  three  varieties  (no  choice  due  to  anonymity  of  raters) 
and  this  undoubtedly  accounts  for  some  of  the  non-reliability  among 
raters.   The  accumulation  of  evidence  from  many  such  studies  where 
raters  could  be  identified  would  reveal  the  specific  ways  in  which 
superiors,  peers,  and  subordinates  differ  in  their  ratings.   Statistical 
corrections  could  then  be  applied  in  order  to  obtain  a  better  estimate 
of  inter-rater  reliability. 

The  sample  size  of  this  study  was  too  small  to  permit  justifiably 
generalizing  from  the  results.   This  study  provides,  however,  a 
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demonstration  of  the  type  of  analysis  that  can  be  performed  and  the 
type  of  information  that  can  be  obtained  by  studies  of  this  type. 
Replications  of  this  study  within  small  clusters  of  officers  who 
are  familiar  with  each  others'  job  performance  would  permit  the 
accumulation  of  information  from  which  generalizations  could  reasonably 
be  made. 
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APPENDIX 

REPORT  ON  THE  FITNESS  OF  OFFICERS 


WORKSHEET 


1  ■  1                      ■ 

1.    N«iE  &•<<>    firtt.    middle) 

2.    GRADE 

3.     USN(R) 

4.     DESIGNATOR 

5.     FILE    NUMBER 

S.     SHIP    OR    STATION 

7.     DATE    REPORTED    PRESENT    DUTY    STATION 

8.    OCCASION    FOR    REPORT 

~\    PCPinnir                  1     DETACHMENT    OF                            1    DETACHMENT 
| |    PERI0DIC        | |     REPORTING    SENIOR          | |    OF    OFFICER 

9.     TYPE    OF    REPORT 
|         |    REGUL*R       O    CURRENT       n    SPECI*L 

10.     PERIOD    OF    REPORT 

F  ROM :                                                                  TO  : 

11.  DUTIES  (List   principal    duties    assigned   and    the   number   of   months    during    the   period   for    which    assigned) 


12.  EMPLOYMENT  OF  COMMAND  DURING  PERIOD  OF  THIS  REPORT 


13.  REFERENCE  HERE  AND  APPEND  ANY  COMMENDABLE  OR  ADVERSE  REPORTS  ON  THIS  OFFICER  RECEIVED  DURING  THE  PERIOD  OF  THIS  REPORT 

14.  PERFORMANCE  OF  DUTIES  (Evaluate    his   per formance    of  duty    in    comparison   mith   other   officers    of   his    grade    and   approximate    length   of    service) 


DUTY  ASSIGNMENT 

NOT 
OBS. 

OR 
N.A. 

Outstanding 
performance. 

(9)       (8) 

Excellent  perform- 
ance.     Frequently 
demonstrates  out- 
st^^g  per^£a)»ce. 

Very  good  performance, 
Frequently  demon- 
strates excellent 
Pe(&r>-Ce.(4) 

Satisfactory 
performance. 
Basically 
(3)ualified(.2) 

Inadequate  perform- 
ance.     He  is  not 
qualifuedj.  N(Adverse) 

(a)    PRESENT    ASSIGNMENT 

(b)     SHIPHANDLING    AND    SEAMANSHIP 

(c)    AIRMANSHIP 

(d)    COLLATERAL    DUTIES 

(e)     AS                                                              WATCH    OFFICER 

- 

(f)     TECHNICAL     SPECIALTY     (                                             .,) 

(g)    COMMAND    POTENTIAL    OR    ABILITY 

(h)     ADMINISTRATIVE    AND   MANAGEMENT    ABILITY 

15.    OVERALL    EVALUATION:        (a)   In  comparison  with  other  officers  of  his  grade  and  approximate  length  of  service,    how  would  you  designate  this  officer? 
(b)  For  this  report  period  indicate  in   (b)  how  many  officers  of  his  grade  you  have  designated  in  each  category  of  (a). 


NOT 
OBSERVED 

One  of  the  highly 

outstandifjgjof  fleers 
I  know 

A  very   fine  officer 
of  gr£at  Value   to 
the  sarvifce 

A  dependable  and 
typical!  TS  Affective 
officeA"5/ 

An  acceptable 

off%2j 

Unsatisfactory 
(  1    1  (Adverse) 

<a> 

• 

(b) 

16.    DESIRABILITY?     Considering  (1)  the  possible  requirements  of  war  and  peace.    (2)  this  officer's  profess  ions  1  end  technicsl  conpetence,  end  (3)  the  edsptshility  of  this  of  ficer  to  the 
varying  conditions  of  nsvsl  service,   indicate^0  Br  attitude  toward  hafr^g  this  officer  under  your  etajsM  in  the  following  ty»t£oJ  eesigmenta:       (  1   ) 


(a)    OPERATIONAL 


NOT 

OBSERVED 


T 


Particularly  desire 


Prefer  to  most 


Pleased  to  have 


Satisfied  to  have 


Prefer  not   to  have 
(Adverse) 


(b)    STAFF    OR    ADMINISTRATIVE 


(c)    FOREIGN    DUTY 


17.     ENTRIES    ON    THIS    REPORT    ARE    BASED   ON    (Check    appropriate    box) 


□ 


DAILY  CONTACT  AND  CLOSE  OBSERVATION 


U" 


EOUENT    OBSERVATION 


INFR 


EOUENT   OBSERVATION 


□ 


RECORDS    AND    REPORTS    ONLY 


18.     FOR    FUTURE    ASSIGNMENTS: 

Based  on  your  observations,    for  what  type  of  duty  do  you  consider  him  best  qualified   for  his  next  assignment  at  sea  and  shore? 


Comment,    if  appropriate 


19.    NAME.    GRADE.     FILE    NUMBER.     DESIGNATOR    AND    OFFICIAL    TITLE    OF    REPORTING    SENIOR. 
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20.    LEADERSHIP  :       In   comparison  with  other  officers  of  his  grade  and  approximate  length  of  duty  assignment,    to  what  degree  has  this  officer  exhibited  the 
following  qualities  of  leadership? 


DEFINITIONS 
OUTSTANDING   -   ONE  out  of  100  -  Exceeds  ALL  others              ACCEPTABLE            -    BELOW 

the  majority 
satisfactory 

a 

UJ 

> 

cr 

tn 
to 
o 

i- 
o 

z 

o 

o  o 
—  z 

u.  a 
o  z 

< 

f-    H 
D   </> 

m 

-j 
< 

z 
o 

t- 

0- 
UJ 

(A 

a 
o 

UJ 

a. 

(8) 

j— 
z 

UJ 

_l 
_l 
u 

UJ 

_l 
a 
< 

\- 
a. 

Ul 

(I) 

< 

z 

o 

OS 

(i) 

EXCEPTIONAL    -    One  of  the  next   top  FEW  -  Extraordinary          MARGINAL                   -    Barely 
SUPERIOR           -    ABOVE   the  great  MAJORITY                                  UNSATISFACTORY 
EXCELLENT        -    EQUAL   to   the  majority 

—    (E 
r-    O 
<   r- 
1/1   (J 
Z    < 

plAeJ-se) 

(a)    PROFESSIONAL    KNOWLEDGE    (Comprehension    of    all    aspects    of    the   profession) 

(b)    MORAL    COURAGE    (To    do    what    he    ought    to    do    regardless    of    consequences    to    himself) 

(c)    LOYALTY  (His   faithfulness    and  allegiance    to   his    shipmates,    his   command,    the   service   and   the 

nation) 

(0)    FORCE   (The  positive   and  enthusiastic   tanner  with  which  he   fulfills   his    responsibilities) 

(e)     INITI  ATI  VE  (His  willingness    to    stir*   out   and  accept    responsibility) 

(f)     INDUSTRY    (The    zeal    exhibited  and   energy   applied   in    the  performance  of  his   duties) 

(g)     IMAGINATION   (Resourcefulness,    ereativeness,    and  capacity    to  plan  constructively) 

(h)    JUDGMENT  (His   ability   to  develop  correct  and  logical    conclusions) 

(i)    RELIABILITY   (The   dependability   and   thoroughness    exhibited   in   meeting   responsibilities) 

(j)    COOPERATION    (His    ability   and  willingness    to  work    m   harmony  with  others) 

,(k)    PERSONAL    BEHAVIOR    (His  demeanor,    disposition,    sociability  and  sobriety) 

(\)    MILITARY    BEARING  (His  military  carriage,  correctness  of  uniform,  smartness  of  appearance  and  physical  fitness) 

(m)    SELF-EXPRESSION    (ORAL)     (His    ability   to-  express   himself  orally) 

(n)     SELF-EXPRESSION   (WRITTEN)      (His   anility    to   express    himself   in  writing) 

'                                                                                        '           ' ■ 

COMMENTS:       (Rcpor  ting    seniors    are    encouraged    to    discuss    this    report    with    the    officer ,    but    not    necessar i ly    show 
(a)    Make    comments    regarding    any    strengths,     special    accompl ishments,     contributions    to    the   N* 
weaknesses.       (Minor    weaknesses    must    be    discussed   with    the    officer) 


it.) 

val  and  National  service,  or  minor 


Have  minor  weaknesses  been  discussed  with  officer? 


□ 


o 


□ 


NOT  APPL I  CABLE 


*(b)  ADVERSE  COMMENTS,  if  any.   Comments  in  this  section  are  mandatory  for  adverse  or  unsatisfactory  marks  in  section  14,  15,  16 
and  20.   Reports  containing  adverse  matter  must  be  referred  for  statement  pursuant  to  Art.  1701.8,  Navy  Regulations.   State- 
ment of  officer  mast  be  attached  to  this  report.   (Marks  in  starred  (*)  boxes  are  adverse.) 


Has  officer  seen 
this  report? 


□  ™     □ 


(c)  What  has  been  the  trend  of  his  performance  since  [_ 
your  last  report' 


ST  REPORT 


□ 


□ 


CONSISTFNT 


□ 


DECLINING 


22.  DATE  FORWARDED 


SIGNATURE  OF  REPORTING  SENIOR 


23.     CONCURRENT    REPORT: 

DATE    FORWARDED 

SIGNATURE    OF    REGULAR    REPORTING    SENIOR 

■        "    —                                                                                                '    ■       """                                                                            ■                           ■■'»■!■ 
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A  sample  of  convenience  was  obtained  in  which  15  officers  completed  (anonymously) 
fitness  reports  on  each  other.   Fitness  report  scales  were  examined  to  determine  their 
quality  based  on  the  statistical  considerations  of  "discrimination"  and  "disagree- 
ment" index.  It  was  found  that  there  was  a  greater  spread  in  scores  (fitness  marks) 
when  a  number  of  judges  rate  one  individual  than  when  an  average  judge  rates  a  number 
of  individuals.   Generalizing  from  the  study  is  prohibited  by  the  size  and  nature  of 
the  sample.   The  study  demonstrates  a  type  of  analysis  that  can  be  performed  and  the 
".ype  of  information  that  can  be  obtained  by  studies  of  this  type.   Replications  of 
he  study  are  recommended. 
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